The policy iteration method is used in solving process.
文中应用策略迭代法求解。
An appropriate selection of basis function directly in?uences the learning performance of a policy iteration method during the value function approximation.
该算法先用渐进方法进行多序列比对,然后通过迭代策略,利用上一轮多序列比对结果修正指导树,产生新一轮比对。
An appropriate selection of basis function directly in? Uences the learning performance of a policy iteration method during the value function approximation.
在策略迭代结强化学习方法的值函数逼近过程中,基函数的合理选择直接影响方法的性能。
应用推荐